Formal Structure of Sanskrit Text: Requirements Analysis for a Mechanical Sanskrit Processor

نویسنده

  • Gérard P. Huet
چکیده

We discuss the mathematical structure of various levels of representation of Sanskrit text in order to guide the design of computer aids aiming at useful processing of the digitalised Sanskrit corpus. Two main levels are identified, respectively called the linear and functional level. The design space of these two levels is sketched, and the computational implications of the main design choices are discussed. Current solutions to the problems of mechanical segmentation, tagging, and parsing of Sanskrit text are briefly surveyed in this light. An analysis of the requirements of relevant linguistic resources is provided, in view of justifying standards allowing inter-operability of computer tools. This paper does not attempt to provide definitive solutions to the representation of Sanskrit at the various levels. It should rather be considered as a survey of various choices, allowing an open discussion of such issues in a formally precise general framework.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending the core functionalities of Aṣṭādhyāyī 2.0

The paper describes new layers of linguistic annotation and explorative tools that were added to the project ‘Aṣṭādhyāyī 2.0’. These additions make it possible to execute complex research queries in the digital version of Pāṇini’s grammar with minimal knowledge both of Sanskrit and database query languages. In the project ‘Aṣṭādhyāyī 2.0’, we have developed a digital edition of Pāṇini’s grammar...

متن کامل

SanskritTagger: A Stochastic Lexical and POS Tagger for Sanskrit

SanskritTagger is a stochastic tagger for unpreprocessed Sanskrit text. The tagger tokenises text with a Markov model and performs part-of-speech tagging with a Hidden Markov model. Parameters for these processes are estimated from a manually annotated corpus of currently about 1.500.000 words. The article sketches the tagging process, reports the results of tagging a few short passages of Sans...

متن کامل

Building a Prototype Text to Speech for Sanskrit

This paper describes about the work done in building a prototype text to speech system for Sanskrit. A basic prototype text-tospeech is built using a simplified Sanskrit phone set, and employing a unit selection technique, where prerecorded sub-word units are concatenated to synthesize a sentence. We also discuss the issues involved in building a full-fledged text-to-speech for Sanskrit.

متن کامل

Analysis of Sanskrit Text: Parsing and Semantic Relations

In this paper, we are presenting our work towards building a dependency parser for Sanskrit language that uses deterministic finite automata(DFA) for morphological analysis and ’utsarga apavaada’ approach for relation analysis. A computational grammar based on the framework of Panini is being developed. A linguistic generalization for Verbal and Nominal database has been made and declensions ar...

متن کامل

Annotating and Analyzing the Aṣṭādhyāyī

The paper introduces the new research project ‘Aṣṭādhyāyī 2.0’ that aims at developing a digital edition of the Aṣṭādhyāyī Pāṇini’s nearly 2,500 years old grammar of Sanskrit, the ancient Indian language. For modern linguists this grammar is interesting for two reasons. First, its Western (re-)discovery in the 19th century had an enormous influence on contemporary linguistics. For example, the ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008